ChopLifeGang group chat data Analysis using Python

This project aims to use data from a WhatsApp group to visualize and gain some insight from it.

In [2]:
!pip install emoji
Requirement already satisfied: emoji in c:\users\user\anaconda3\lib\site-packages (0.6.0)
In [7]:
import re
import regex
import pandas as pd
import numpy as np
import emoji
import plotly.express as px
from collections import Counter
import matplotlib.pyplot as plt
from os import path
from PIL import Image
import datetime
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
%matplotlib inline

DATA PREPARATION:

We want our data to be presented in such relevant columns:

19/02/17,4:44AM - Pam: Where is everyone?

{Date}, {Time} - {Author}:{Message}

{19/02/17}, {4:44AM} - {Pam}:{Where is everyone?}

The function below helps us achieve this.

In [11]:
def startsWithDateAndTime(s):
    pattern = '^([0-9]+)(\/)([0-9]+)(\/)([0-9]+), ([0-9]+):([0-9]+)[ ]?(AM|PM|am|pm)? -'
    result = re.match(pattern, s)
    if result:
        return True
    return False  
            
In [12]:
startsWithDateAndTime('7/26/18, 22:51 - Bobby: This message was deleted')
Out[12]:
True
In [13]:
def FindAuthor(s):
    s=s.split(":")
    if len(s)==2:
        return True
    else:
        return False
In [14]:
 def getDataPoint(line):
    splitLine = line.split(' - ')
    dateTime = splitLine[0]
    date, time = dateTime.split(', ')
    message = ' '.join(splitLine[1:])
    if FindAuthor(message):
        splitMessage = message.split(': ')
        author = splitMessage[0]
        message = ' '.join(splitMessage[1:])
    else:
        author = None
    return date, time, author, message
In [16]:
parsedData = []  #List to keep track of data so it can be used by a Pandas dataframe
conversationPath = 'C:/Users/USER/Desktop/Semicolon/MachineLearning/WhatsApp Chat with #ChopLifeGang.txt'
with open(conversationPath, encoding="utf-8") as fp:
    fp.readline()  #Skipping first line of the file because it contains information about end-to-end encryption
    messageBuffer = []
    date, time, author = None, None, None
    while True:
        line = fp.readline()
        if not line:
            break
        line = line.strip()
        if startsWithDateAndTime(line):
            if len(messageBuffer) > 0:
                parsedData.append([date, time, author, ' '.join(messageBuffer)])
            messageBuffer.clear()
            date, time, author, message = getDataPoint(line)
            messageBuffer.append(message)
        else:
            messageBuffer.append(line)
In [17]:
df = pd.DataFrame(parsedData, columns=['Date', 'Time', 'Author', 'Message']) #Initialising a Pandas dataframe
df["Date"] = pd.to_datetime(df["Date"])
In [18]:
df.head(2)
Out[18]:
Date Time Author Message
0 2020-01-20 8:07 PM None Ahmad Pablo El Jefe created group "#ChopLifeGang"
1 2020-01-20 8:07 PM None You were added
In [19]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12650 entries, 0 to 12649
Data columns (total 4 columns):
Date       12650 non-null datetime64[ns]
Time       12650 non-null object
Author     12420 non-null object
Message    12650 non-null object
dtypes: datetime64[ns](1), object(3)
memory usage: 395.4+ KB

Group members

In [20]:
df.Author.unique()
Out[20]:
array([None, 'Engr A . S . Jibrin CLG', 'Abdul Xzeecool', 'Slymstar CLG',
       'FBI Vibe CLG', 'Cupcake CLG', 'Kim CLG', 'Fati CLG Thriftshop',
       'Vynze Cent', 'Mjay CLG', 'Timbyen CLG', 'Trust CLG',
       'Ahmad Pablo El Jefe', 'Abdulsalam CLG', 'Nanbyet CLG',
       'Lamba CLG', 'Caspeezie', 'Zazi CLG', 'Sir Max CLG',
       'Eli Shoe Plug', 'Abbie CLG', 'Elhassan CLG', 'Ninah CLG',
       'Ekpene CLG', 'Xeey CLG', 'Jayde CLG', 'Yo', 'Liu CLG', 'Jem CLG'],
      dtype=object)

None is the first element in our array that has no author. Group was created and Caspeezie was added. Let us remove the messages tagged under None.

In [21]:
df = df.dropna()
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12420 entries, 2 to 12649
Data columns (total 4 columns):
Date       12420 non-null datetime64[ns]
Time       12420 non-null object
Author     12420 non-null object
Message    12420 non-null object
dtypes: datetime64[ns](1), object(3)
memory usage: 485.2+ KB
In [22]:
df.Author.unique()
Out[22]:
array(['Engr A . S . Jibrin CLG', 'Abdul Xzeecool', 'Slymstar CLG',
       'FBI Vibe CLG', 'Cupcake CLG', 'Kim CLG', 'Fati CLG Thriftshop',
       'Vynze Cent', 'Mjay CLG', 'Timbyen CLG', 'Trust CLG',
       'Ahmad Pablo El Jefe', 'Abdulsalam CLG', 'Nanbyet CLG',
       'Lamba CLG', 'Caspeezie', 'Zazi CLG', 'Sir Max CLG',
       'Eli Shoe Plug', 'Abbie CLG', 'Elhassan CLG', 'Ninah CLG',
       'Ekpene CLG', 'Xeey CLG', 'Jayde CLG', 'Yo', 'Liu CLG', 'Jem CLG'],
      dtype=object)

We have successfully removed the None author!

#Group Wise Statistics

In [23]:
total_messages = df.shape[0]
print (total_messages)
12420

Let us find out the total number of media messages

In [24]:
media_messages = df[df['Message'] == '<Media omitted>'].shape[0]
print(media_messages)
1548
In [26]:
def split_count(text):
    emoji_list = []
    data = regex.findall(r'\X', text)
    for word in data:
        if any(char in emoji.UNICODE_EMOJI for char in word):
            emoji_list.append(word)
    
    return emoji_list

df["emoji"] = df["Message"].apply(split_count)
In [28]:
emojis = sum(df['emoji'].str.len())
print(emojis)
7582
In [29]:
URLPATTERN = r'(https?://\S+)'
df['urlcount'] = df.Message.apply(lambda x: re.findall(URLPATTERN, x)).str.len()
In [30]:
links = np.sum(df.urlcount)
In [31]:
print("Group Wise Statistics")
print("Messages:", total_messages)
print("Media:", media_messages)
print("Emojis", emojis)
print("Links:", links)
Group Wise Statistics
Messages: 12420
Media: 1548
Emojis 7582
Links: 9

Let us seperate the media messages and text messages

In [32]:
media_messages_df = df[df['Message'] == '<Media omitted>']
In [33]:
messages_df = df.drop(media_messages_df.index)
In [34]:
messages_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10872 entries, 2 to 12649
Data columns (total 6 columns):
Date        10872 non-null datetime64[ns]
Time        10872 non-null object
Author      10872 non-null object
Message     10872 non-null object
emoji       10872 non-null object
urlcount    10872 non-null int64
dtypes: datetime64[ns](1), int64(1), object(4)
memory usage: 594.6+ KB

Let's add two new columns to our dataframe that would detail the number of letters and words each author uses. We will name them "Letter_Count" and "Word_Count" respectively.

In [35]:
messages_df['Letter_Count'] = messages_df['Message'].apply(lambda s : len(s))
messages_df['Word_Count'] = messages_df['Message'].apply(lambda s : len(s.split(' ')))
messages_df["MessageCount"]=1
In [36]:
messages_df.head(2)
Out[36]:
Date Time Author Message emoji urlcount Letter_Count Word_Count MessageCount
2 2020-10-08 9:09 AM Engr A . S . Jibrin CLG Good morning [] 0 12 2 1
3 2020-10-08 9:31 AM Abdul Xzeecool Good morning fam [] 0 16 3 1
In [37]:
messages_df["emojicount"]= df['emoji'].str.len()
In [38]:
#Creates a list of unique Authors
l = messages_df.Author.unique()

for i in range(len(l)):
    #filtering out messages of particular user
    req_df = messages_df[messages_df['Author'] == l[i]]
    
    #req_df will contain messages of only one particular user
    print(f'Stats of {l[i]} -')
    
    #shape will print number of rows which also means the number of messages
    print('Messages Sent', req_df.shape[0])
    
    #Word_Count contains total words in one message. Sum of all words/Total messages will provide words per message
    words_per_message = (np.sum(req_df['Word_Count']))/req_df.shape[0]
    print('Words per message', words_per_message)
    
    #media consists of media messages
    media = media_messages_df[media_messages_df['Author'] == l[i]].shape[0]
    print('Media Messages Sent', media)
    
    #emojis consists of media messages
    emojis = sum(req_df['emoji'].str.len())
    print('Emojis Sent', emojis)
    
    #links consists of total links
    links = sum(req_df["urlcount"])
    print('Links Sent', links)
    print()
Stats of Engr A . S . Jibrin CLG -
Messages Sent 693
Words per message 5.854256854256854
Media Messages Sent 80
Emojis Sent 112
Links Sent 0

Stats of Abdul Xzeecool -
Messages Sent 1264
Words per message 6.28876582278481
Media Messages Sent 73
Emojis Sent 417
Links Sent 0

Stats of FBI Vibe CLG -
Messages Sent 249
Words per message 4.329317269076305
Media Messages Sent 17
Emojis Sent 135
Links Sent 1

Stats of Cupcake CLG -
Messages Sent 434
Words per message 7.587557603686636
Media Messages Sent 89
Emojis Sent 285
Links Sent 0

Stats of Kim CLG -
Messages Sent 657
Words per message 6.926940639269406
Media Messages Sent 59
Emojis Sent 490
Links Sent 0

Stats of Fati CLG Thriftshop -
Messages Sent 882
Words per message 5.98639455782313
Media Messages Sent 274
Emojis Sent 1300
Links Sent 1

Stats of Slymstar CLG -
Messages Sent 501
Words per message 6.632734530938124
Media Messages Sent 266
Emojis Sent 705
Links Sent 0

Stats of Vynze Cent -
Messages Sent 324
Words per message 9.512345679012345
Media Messages Sent 62
Emojis Sent 36
Links Sent 4

Stats of Mjay CLG -
Messages Sent 790
Words per message 4.821518987341772
Media Messages Sent 56
Emojis Sent 330
Links Sent 0

Stats of Trust CLG -
Messages Sent 633
Words per message 7.837282780410742
Media Messages Sent 54
Emojis Sent 655
Links Sent 0

Stats of Ahmad Pablo El Jefe -
Messages Sent 756
Words per message 7.28042328042328
Media Messages Sent 33
Emojis Sent 172
Links Sent 1

Stats of Abdulsalam CLG -
Messages Sent 77
Words per message 8.597402597402597
Media Messages Sent 25
Emojis Sent 27
Links Sent 0

Stats of Nanbyet CLG -
Messages Sent 89
Words per message 3.831460674157303
Media Messages Sent 3
Emojis Sent 126
Links Sent 0

Stats of Lamba CLG -
Messages Sent 72
Words per message 9.875
Media Messages Sent 12
Emojis Sent 53
Links Sent 0

Stats of Caspeezie -
Messages Sent 719
Words per message 5.851182197496523
Media Messages Sent 17
Emojis Sent 659
Links Sent 0

Stats of Timbyen CLG -
Messages Sent 166
Words per message 3.63855421686747
Media Messages Sent 50
Emojis Sent 17
Links Sent 0

Stats of Zazi CLG -
Messages Sent 102
Words per message 5.245098039215686
Media Messages Sent 6
Emojis Sent 32
Links Sent 0

Stats of Sir Max CLG -
Messages Sent 74
Words per message 5.513513513513513
Media Messages Sent 6
Emojis Sent 18
Links Sent 0

Stats of Eli Shoe Plug -
Messages Sent 156
Words per message 7.871794871794871
Media Messages Sent 48
Emojis Sent 113
Links Sent 0

Stats of Abbie CLG -
Messages Sent 208
Words per message 5.413461538461538
Media Messages Sent 6
Emojis Sent 162
Links Sent 1

Stats of Elhassan CLG -
Messages Sent 38
Words per message 6.842105263157895
Media Messages Sent 14
Emojis Sent 72
Links Sent 0

Stats of Ninah CLG -
Messages Sent 562
Words per message 6.733096085409253
Media Messages Sent 90
Emojis Sent 275
Links Sent 1

Stats of Ekpene CLG -
Messages Sent 550
Words per message 6.14
Media Messages Sent 36
Emojis Sent 453
Links Sent 0

Stats of Xeey CLG -
Messages Sent 376
Words per message 4.7313829787234045
Media Messages Sent 117
Emojis Sent 404
Links Sent 0

Stats of Jayde CLG -
Messages Sent 299
Words per message 4.351170568561873
Media Messages Sent 46
Emojis Sent 396
Links Sent 0

Stats of Yo -
Messages Sent 165
Words per message 13.23030303030303
Media Messages Sent 1
Emojis Sent 99
Links Sent 0

Stats of Liu CLG -
Messages Sent 15
Words per message 6.333333333333333
Media Messages Sent 1
Emojis Sent 0
Links Sent 0

Stats of Jem CLG -
Messages Sent 21
Words per message 4.285714285714286
Media Messages Sent 7
Emojis Sent 39
Links Sent 0

Most Used Emoji in Group

Emoji Stats

Unique emojis used in group

In [39]:
total_emojis_list = list(set([a for b in messages_df.emoji for a in b]))
total_emojis = len(total_emojis_list)
print(total_emojis)
205

Most Used emoji

In [41]:
total_emojis_list = list([a for b in messages_df.emoji for a in b])
emoji_dict = dict(Counter(total_emojis_list))
emoji_dict = sorted(emoji_dict.items(), key=lambda x: x[1], reverse=True)
print(emoji_dict)
[('😂', 2740), ('🤣', 1314), ('😭', 416), ('😅', 279), ('😩', 164), ('🙄', 160), ('💔', 123), ('🙌🏽', 111), ('😁', 110), ('😫', 99), ('🌝', 88), ('😍', 82), ('😘', 80), ('🤔', 76), ('😒', 73), ('🥺', 62), ('🙌🏿', 60), ('👀', 58), ('😑', 55), ('🥰', 53), ('😳', 50), ('☹', 42), ('🔥', 38), ('🚶🏿\u200d♂️', 37), ('🙏', 35), ('❤️', 33), ('🌚', 32), ('✌️', 27), ('👍', 27), ('🤧', 27), ('😃', 25), ('🤭', 25), ('💃🏻', 24), ('😢', 21), ('💃🏾', 21), ('💆🏿\u200d♂️', 21), ('🇳🇬', 21), ('😎', 20), ('😄', 19), ('💯', 19), ('🤗', 19), ('🤦🏿\u200d♂', 19), ('🧐', 18), ('🏃\u200d♂️', 17), ('💋', 17), ('😶', 17), ('😞', 15), ('🙆🏿\u200d♂️', 15), ('😨', 15), ('✊🏿', 14), ('😱', 14), ('🏃', 14), ('🤦🏾\u200d♂️', 14), ('😋', 14), ('😕', 14), ('🙏🏿', 13), ('😓', 13), ('🙃', 12), ('❗', 12), ('🤝🏿', 12), ('🙌', 11), ('👌🏿', 11), ('🤦🏽\u200d♀️', 11), ('😏', 10), ('👍🏿', 10), ('😤', 10), ('🤷\u200d♂️', 9), ('✊🏽', 9), ('🥳', 9), ('🥴', 9), ('🚶🏽\u200d♂️', 9), ('🤪', 9), ('📌', 9), ('☹️', 8), ('🚶🏻\u200d♀️', 8), ('❣️', 8), ('🤦🏼\u200d♀️', 8), ('🚶\u200d♀️', 7), ('😲', 7), ('💨', 7), ('🤐', 7), ('😉', 7), ('🙆', 7), ('🤡', 7), ('🙆🏻\u200d♂️', 7), ('😖', 7), ('😈', 7), ('🤷🏿\u200d♂️', 6), ('💪🏼', 6), ('♥️', 6), ('🙈', 6), ('✌🏿', 6), ('\U0001f971', 6), ('🤞🏿', 6), ('😡', 6), ('✊🏾', 5), ('\U0001f90e', 5), ('🙌🏻', 5), ('🤦', 5), ('👇', 5), ('😐', 5), ('🙇🏽\u200d♀', 4), ('🤦🏽\u200d♂', 4), ('🖐🏻', 4), ('👆', 4), ('😀', 4), ('💃🏼', 4), ('🖐️', 4), ('💃', 4), ('🤨', 4), ('🙁', 4), ('🙆🏾\u200d♀️', 4), ('🤦🏽\u200d♂️', 4), ('🤮', 4), ('🏃🏿\u200d♂️', 3), ('🙇🏿\u200d♂️', 3), ('🙏🏻', 3), ('❌', 3), ('🤷🏽\u200d♀️', 3), ('🏃🏾', 3), ('💪🏾', 3), ('✊', 3), ('😊', 3), ('🙌🏼', 3), ('🍿', 3), ('✌🏻', 3), ('👅', 3), ('🕺🏻', 3), ('🗣️', 3), ('😣', 3), ('👌🏽', 3), ('🤲🏻', 3), ('❤', 3), ('🚶🏾', 3), ('👨🏿\u200d\U0001f9af', 3), ('💃🏿', 2), ('😥', 2), ('✊🏼', 2), ('🦁', 2), ('💛', 2), ('🖤', 2), ('😧', 2), ('🤯', 2), ('⚡', 2), ('🏃🏽\u200d♂️', 2), ('😝', 2), ('🙆🏽\u200d♂', 2), ('🙏🏾', 2), ('😌', 2), ('🥂', 2), ('🚶🏾\u200d♀️', 2), ('🚶\u200d♂️', 2), ('🏃🏽\u200d♀️', 2), ('✔', 2), ('💁🏼\u200d♀️', 2), ('👨🏾\u200d\U0001f9af', 2), ('💤', 2), ('🚮', 2), ('😰', 2), ('🤢', 2), ('🏃\u200d♀️', 2), ('🚇', 1), ('🤝🏾', 1), ('🙌🏾', 1), ('🤦🏻\u200d♂', 1), ('🧡', 1), ('\U0001f90d', 1), ('💙', 1), ('🥵', 1), ('⚽', 1), ('🐸', 1), ('🚶🏼\u200d♀️', 1), ('✌🏾', 1), ('👑', 1), ('☺', 1), ('😆', 1), ('😛', 1), ('🎉', 1), ('🏃🏾\u200d♀️', 1), ('🤦\u200d♂️', 1), ('🌞', 1), ('☝🏾', 1), ('😟', 1), ('😔', 1), ('🌹', 1), ('👍🏼', 1), ('😴', 1), ('✌🏽', 1), ('♥', 1), ('💖', 1), ('🍆', 1), ('🤫', 1), ('👉🏿', 1), ('💀', 1), ('✌', 1), ('🤤', 1), ('🧘', 1), ('🚛', 1), ('🗽', 1), ('🎠', 1), ('🚢', 1), ('🧵', 1), ('🙆🏽\u200d♀️', 1), ('✨', 1), ('💦', 1)]
In [42]:
emoji_df = pd.DataFrame(emoji_dict, columns=['emoji', 'count'])
emoji_df
Out[42]:
emoji count
0 😂 2740
1 🤣 1314
2 😭 416
3 😅 279
4 😩 164
... ... ...
200 🚢 1
201 🧵 1
202 🙆🏽‍♀️ 1
203 1
204 💦 1

205 rows × 2 columns

Emoji distribution visualization

In [43]:
import plotly.express as px
fig = px.pie(emoji_df, values='count', names='emoji')
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

Person-wise Emoji distribution

In [45]:
l = messages_df.Author.unique()
for i in range(len(l)):
    dummy_df = messages_df[messages_df['Author'] == l[i]]
    total_emojis_list = list([a for b in dummy_df.emoji for a in b])
    emoji_dict = dict(Counter(total_emojis_list))
    emoji_dict = sorted(emoji_dict.items(), key=lambda x: x[1], reverse=True)
    print('Emoji Distribution for', l[i])
    author_emoji_df = pd.DataFrame(emoji_dict, columns=['emoji', 'count'])
    fig = px.pie(author_emoji_df, values='count', names='emoji')
    fig.update_traces(textposition='inside', textinfo='percent+label')
    fig.show()
Emoji Distribution for Engr A . S . Jibrin CLG
Emoji Distribution for Abdul Xzeecool
Emoji Distribution for FBI Vibe CLG
Emoji Distribution for Cupcake CLG
Emoji Distribution for Kim CLG
Emoji Distribution for Fati CLG Thriftshop
Emoji Distribution for Slymstar CLG
Emoji Distribution for Vynze Cent
Emoji Distribution for Mjay CLG
Emoji Distribution for Trust CLG
Emoji Distribution for Ahmad Pablo El Jefe
Emoji Distribution for Abdulsalam CLG
Emoji Distribution for Nanbyet CLG
Emoji Distribution for Lamba CLG
Emoji Distribution for Caspeezie
Emoji Distribution for Timbyen CLG
Emoji Distribution for Zazi CLG
Emoji Distribution for Sir Max CLG
Emoji Distribution for Eli Shoe Plug
Emoji Distribution for Abbie CLG
Emoji Distribution for Elhassan CLG
Emoji Distribution for Ninah CLG
Emoji Distribution for Ekpene CLG
Emoji Distribution for Xeey CLG
Emoji Distribution for Jayde CLG
Emoji Distribution for Yo
Emoji Distribution for Liu CLG
Emoji Distribution for Jem CLG

Day wise Distribution

In [46]:
def f(i):
    l = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    return l[i];
day_df=pd.DataFrame(messages_df["Message"])
day_df['day_of_date'] = messages_df['Date'].dt.weekday
day_df['day_of_date'] = day_df["day_of_date"].apply(f)
day_df["messagecount"] = 1
day = day_df.groupby("day_of_date").sum()
day.reset_index(inplace=True)
In [52]:
fig = px.line_polar(day, r='messagecount', theta='day_of_date', line_close=True)
fig.update_traces(fill='toself')
fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True
    ),
  ),
  showlegend=False
)

fig.show()

We can notice that the most messages has been posted on a Monday! Then probably followed by Thursday...

Number of messages as times move on

In [53]:
date_df = messages_df.groupby("Date").sum()
date_df.reset_index(inplace=True)
fig = px.line(date_df, x="Date", y="MessageCount")
fig.update_xaxes(nticks=20)
fig.show()

Chatter

In [55]:
auth = messages_df.groupby("Author").sum()
auth.reset_index(inplace=True)
fig = px.bar(auth, y="Author", x="MessageCount", color='Author', orientation="h",
             color_discrete_sequence=["red", "green", "blue", "goldenrod", "magenta"],
             title="Explicit color sequence"
            )

fig.show()

Our Chatter happens to be Xzeeecool with messages of more than 1000!!! Congratulations

When are the group members most active?

In [57]:
messages_df['Time'].value_counts().head(15).plot.barh()
plt.xlabel('Number of messages')
plt.ylabel('Time')
Out[57]:
Text(0, 0.5, 'Time')

As noticed above, the group comes alive mostly at night, something around 9:00pm.

The most happening day was :-

In [58]:
messages_df['Date'].value_counts().head(12).plot.barh()
print(messages_df['Date'].value_counts())
plt.xlabel('Number of Messages')
plt.ylabel('Date')
2020-10-26    1405
2020-10-22    1014
2020-10-20     846
2020-10-13     828
2020-10-24     823
2020-10-25     733
2020-10-15     694
2020-10-12     654
2020-10-23     644
2020-10-14     618
2020-10-21     434
2020-10-19     406
2020-10-16     319
2020-10-08     284
2020-10-18     282
2020-10-09     246
2020-10-11     239
2020-10-17     218
2020-10-10     134
2020-10-27      51
Name: Date, dtype: int64
Out[58]:
Text(0, 0.5, 'Date')

The most happening day coincidentally happened to be yesterday, 26th October, 2020. I guess the lectures @VynzeCent was giving really had everyone involved!!!

Word Cloud

A word cloud is a visual representation of words in a particular text. The size of words in the word cloud is directly proportional to the frequency of that word in a text. We will create a word cloud for all the messages in the group.

In [60]:
text = " ".join(review for review in messages_df.Message)
print("There are {} words in all the messages.".format(len(text)))
There are 362784 words in all the messages.
In [61]:
stopwords = set(STOPWORDS)
stopwords.update(["ra", "ga", "na", "ani", "em", "ki", "ah","ha","la","eh","ne","le"])

#Generates a word cloud image
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)

#Display the generated image, the matplotlib way:
plt.figure(figsize=(10,5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

So far, "dey" and "go" are the most used words on the ChopLifeGang group chat!! It is why the words are boldly printed out

Conclusion

We can conclude that some interesting details could be derived from this little data analysis. Though this work is still a work in progress, feel free to make use of the code and try out your analysis too!!!